observability: phase 6.1 PR-C — 4 conformance fixtures#24
Merged
Conversation
Drive four spec observability conformance fixtures end-to-end
through the existing test_observability harness:
- 002-otel-subgraph-hierarchy: §4.5 synthetic dispatch span
parents under invocation; inner-node spans parent under
dispatch span with the nested ["outer_sub", "inner_x"] /
["outer_sub", "inner_y"] namespace.
- 003-otel-error-status: §4.2 ERROR status mapping for the
node_exception case (status_description == "node_exception",
recorded exception event, openarmature.error.category
attribute, ok_node stays OK).
- 007-otel-retry-attempt-spans: both sub-cases. Three sibling
attempt spans with distinct attempt_index 0..2, single shared
invocation parent. ERROR/ERROR/OK on retry-succeeds-third;
ERROR x3 on retry-exhausts.
- 011-otel-determinism: signature-tuple comparison
(name, status_code, sorted attrs minus the canonical
non-deterministic-by-design ignore set: invocation_id +
auto-generated correlation_id) across two runs.
Two minor YAML-shape translations kept inline in the
fixture-specific drivers so the existing adapter stays
unchanged (the adapter is shared with other suites; this PR
doesn't touch it):
- Fixture 007 flaky.fail_count=N rewritten to the adapter's
failure_sequence shape before build_graph.
- Fixture 011 when:{field, gt: 0} edge syntax rewritten to the
adapter's condition:{if_field, equals, then, else} shape via
_translate_011_when_edges (gt:0 with the fixture's
deterministic input is functionally equivalent to equals:1
because counter==1 always under that input).
Map updates:
- _SUPPORTED_FIXTURES grows from 4 to 8.
- _DEFERRED_FIXTURES shrinks from 7 to 3 (004 / 006 / 010);
each remaining note now spells out the specific gating-PR +
spec/architectural rationale (PR-C.1 + proposal 0012 /
PR-C.2 / PR-C.3) rather than the generic "deferred to
Phase 6.1" placeholder.
Module docstring rewritten to distinguish the 8 driven
fixtures from the 3 deferred ones.
No engine, observer, or correlation changes. Pure test-side
work.
392 tests pass (was 388; net +4 from the new fixture drivers).
5 skipped (3 deferred conformance fixtures + 2 pre-existing).
Pyright clean.
There was a problem hiding this comment.
Pull request overview
This PR extends the tests/conformance/test_observability.py conformance harness to drive four additional OpenTelemetry observability fixtures end-to-end (002, 003, 007, 011) and updates the deferred-fixture notes to include more specific gating rationale.
Changes:
- Add drivers and assertions for fixtures 002 (subgraph hierarchy), 003 (error status), 007 (retry attempt spans), and 011 (determinism).
- Refine deferred-fixture skip messages with concrete gating requirements and follow-up PR pointers.
- Add small fixture-specific YAML translations inside the test driver for fixtures 007 and 011.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Fix invocation span status propagation. The OTelObserver was
unconditionally calling set_status(OK) on the invocation span
in _close_invocation_span; OTel doesn't auto-propagate child
status to parents, so the spec §4.2 / fixture 003 contract
("invocation span ends ERROR when a child errored") wasn't
satisfied. _handle_completed now sets ERROR on the invocation
span when a child errors; _close_invocation_span no longer
forces OK (clean invocations end UNSET, which exporters map
to OK; ERROR is preserved by OTel SDK status precedence).
Driver _run_fixture_003 gains the explicit invocation-status
ERROR assertion + docstring fix.
- Fix _signature in _run_fixture_011 to include parent_name.
Previously the docstring claimed "hierarchy" and the comment
said "parent-by-name" but the signature ignored span.parent
entirely. Now looks up parent.span_id in a per-run by-id map
and includes the parent's name; a hierarchy regression where
a node reparented to a different ancestor now surfaces as a
signature divergence.
- Fix comment drift in _run_fixture_011_case: the translated
edge condition is "equals: 1" (matching the deterministic
counter == 1 input), not "equals: True".
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR-C of Phase 6.1: drive four spec observability conformance
fixtures end-to-end through the existing
test_observabilityharness. Pure test-side work — noengine, observer, or correlation changes.
Driven (this PR)
dispatch span parents under the invocation; inner-node spans
parent under the dispatch span with the nested
["outer_sub", "inner_x"]/["outer_sub", "inner_y"]namespace.
the
node_exceptioncase(
status_description == "node_exception", recordedexceptionevent,openarmature.error.categoryattribute,
ok_nodestays OK).sibling attempt spans with distinct
attempt_index 0..2,single shared invocation parent.
ERROR/ERROR/OKon retry-succeeds-third;ERROR×3on retry-exhausts.(name, status_code, sorted attrs minus the canonical
non-deterministic-by-design ignore set:
invocation_id+auto-generated
correlation_id) across two runs of thesame graph.
Deferred to follow-up PRs
Each remaining
_DEFERRED_FIXTURESentry now spells out thespecific gating-PR + spec/architectural rationale rather than
the generic "deferred to Phase 6.1" placeholder:
proposal 0012's spec amendment + v0.9.0 spec submodule
bump). Routing-error attribution requires the §3/§6 ordering
swap so
RoutingErrorlands on the preceding node'scompletedevent witherrorpopulated.Spec §5.4 mandates per-instance dispatch spans nested between
the fan-out node span and the inner-node spans, plus three
fan-out-config attributes the observer doesn't currently
surface (
item_count,concurrency,error_policy).Current observer opens one shared synthetic span at the
namespace prefix instead.
Needs synchronous observer prep hook (
prepare_sync) sothe engine task can attach the observer's span to OTel
context for the duration of node-body execution. Observer
span creation runs on the worker task today and isn't
available synchronously after
_dispatch_started.PR-C.1 / PR-C.2 / PR-C.3 are mutually independent and land in
any order once their respective gates clear (PR-C.2 + PR-C.3
have no gate; PR-C.1 waits on the spec submodule bump).
Two minor YAML translations kept inline
Both fixture-specific so the shared adapter stays unchanged
(the adapter is used by other test suites; this PR doesn't
touch it):
flaky.fail_count: Nrewritten to theadapter's
failure_sequenceshape beforebuild_graph.when: {field: counter, gt: 0}edge syntaxrewritten to the adapter's
condition: {if_field, equals, then, else}shape via_translate_011_when_edges(the deterministic input meanscounter == 1always, sogt: 0is functionallyequivalent to
equals: 1for this fixture's flow).Spec agent flagged the harness-DSL gap as a small non-blocking
follow-up — extend
tests/conformance/adapter.pyedge-conditionDSL with
gt/gte/lt/lte/in/not_inso future fixtures with arithmetic edge conditions don't need
per-driver translation. Tracked separately under harness backlog
in
docs/phase-6-1-conformance-fillin.md.Test plan
uv run pytest -q— 392 passed (was 388; net +4 fromthe new fixture drivers), 5 skipped (3 deferred
conformance fixtures + 2 pre-existing)
uv run pyright— 0 errorsuv run ruff check . && uv run ruff format— cleanfile (
tests/conformance/test_observability.py)